Off-line Handwriting Recognition from Forms
نویسندگان
چکیده
A public domain optical character recognition (OCR) system has been developed by the National Institute of Standards and Technology (NIST) to provide a baseline of performance on off-line handwriting recognition from forms. The system’s source code, training data, and performance assessment tools are all publicly available. The system recognizes the handprint written on Handwriting Sample Forms as distributed on the CD-ROM, NIST Special Database 19. The public domain package contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of this algorithm. The modular design of the software makes it useful for training and testing set validation, multiple system voting schemes, and component evaluation and comparison. As an example, the OCR results from two versions of the recognition system (each using a different method of form removal) are presented and analyzed. It is shown that intelligent form removal can improve lowercase recognition by as much as 3%, but this net increase in performance is insufficient to understand the impact on the recognition. A method of analysis is provided, whereby the local changes in performance (both gains and losses) of the system are automatically determined. This involves analyzing the distribution statistics of corresponding character confusion pairs between the two systems. As off-line handwriting recognition technology continues to improve, sophisticated analyses like this are necessary to reduce the errors remaining in complex recognition problems.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملA Full English Sentence Database for Off-line Handwriting Recognition
In this paper we present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of texts that were used to generate forms, which subsequently were filled out by persons with their handwriting. Up to now (December 1998) the database include...
متن کاملStrategies for Combining On-line and Off-line Information in an On-line Handwriting Recognition System
This paper investigates the cooperation of on-line and off-line handwriting word recognition systems. Our goal is to improve a mature on-line recognition system by exploiting the complementary information present in the off-line representation built from on-line signal. After describing the on-line and off-line HMM based handwriting recognition systems, we propose a formal framework, which allo...
متن کاملFrom Off-line to On-line Handwriting Recognition
On-line handwriting includes more information on the time order of the writing signal and on the dynamics of the writing process than off-line handwriting. Therefore, on-line recognition systems achieve higher recognition rates. This can be concluded from results reported in the literature, and has been demonstrated empirically as part of this work. We propose a new approach for recovering the ...
متن کاملAn Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System
Handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, historical documents. This paper focuses on the comparative study on off-line handwriting recognition system and Printed Characters by taking Arabic handwriting. The off-line Handwriting Recognition methods for Arabic words which being often used among then across the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007